Goto

Collaborating Authors

 minimum point



Simultaneous Optimization of Geodesics and Fréchet Means

Rygaard, Frederik Möbius, Hauberg, Søren, Markvorsen, Steen

arXiv.org Machine Learning

A central part of geometric statistics is to compute the Fréchet mean. This is a well-known intrinsic mean on a Riemannian manifold that minimizes the sum of squared Riemannian distances from the mean point to all other data points. The Fréchet mean is simple to define and generalizes the Euclidean mean, but for most manifolds even minimizing the Riemannian distance involves solving an optimization problem. Therefore, numerical computations of the Fréchet mean require solving an embedded optimization problem in each iteration. We introduce the GEORCE-FM algorithm to simultaneously compute the Fréchet mean and Riemannian distances in each iteration in a local chart, making it faster than previous methods. We extend the algorithm to Finsler manifolds and introduce an adaptive extension such that GEORCE-FM scales to a large number of data points. Theoretically, we show that GEORCE-FM has global convergence and local quadratic convergence and prove that the adaptive extension converges in expectation to the Fréchet mean. We further empirically demonstrate that GEORCE-FM outperforms existing baseline methods to estimate the Fréchet mean in terms of both accuracy and runtime.


De-singularity Subgradient for the $q$-th-Powered $\ell_p$-Norm Weber Location Problem

Lai, Zhao-Rong, Wu, Xiaotian, Fang, Liangda, Chen, Ziliang, Li, Cheng

arXiv.org Artificial Intelligence

The Weber location problem is widely used in several artificial intelligence scenarios. However, the gradient of the objective does not exist at a considerable set of singular points. Recently, a de-singularity subgradient method has been proposed to fix this problem, but it can only handle the $q$-th-powered $\ell_2$-norm case ($1\leqslant q<2$), which has only finite singular points. In this paper, we further establish the de-singularity subgradient for the $q$-th-powered $\ell_p$-norm case with $1\leqslant q\leqslant p$ and $1\leqslant p<2$, which includes all the rest unsolved situations in this problem. This is a challenging task because the singular set is a continuum. The geometry of the objective function is also complicated so that the characterizations of the subgradients, minimum and descent direction are very difficult. We develop a $q$-th-powered $\ell_p$-norm Weiszfeld Algorithm without Singularity ($q$P$p$NWAWS) for this problem, which ensures convergence and the descent property of the objective function. Extensive experiments on six real-world data sets demonstrate that $q$P$p$NWAWS successfully solves the singularity problem and achieves a linear computational convergence rate in practical scenarios.


Local Complexity of Stochastic Convex Optimization

Neural Information Processing Systems

We extend the traditional worst-case, minimax analysis of stochastic convex optimization by introducing a localized form of minimax complexity for individual functions. Our main result gives function-specific lower and upper bounds on the number of stochastic subgradient evaluations needed to optimize either the function or its "hardest local alternative" to a given numerical precision. The bounds are expressed in terms of a localized and computational analogue of the modulus of continuity that is central to statistical minimax analysis. We show how the computational modulus of continuity can be explicitly calculated in concrete cases, and relates to the curvature of the function at the optimum. We also prove a superefficiency result that demonstrates it is a meaningful benchmark, acting as a computational analogue of the Fisher information in statistical estimation. The nature and practical implications of the results are demonstrated in simulations.


The Novel Adaptive Fractional Order Gradient Decent Algorithms Design via Robust Control

Liu, Jiaxu, Chen, Song, Cai, Shengze, Xu, Chao

arXiv.org Artificial Intelligence

The vanilla fractional order gradient descent may oscillatively converge to a region around the global minimum instead of converging to the exact minimum point, or even diverge, in the case where the objective function is strongly convex. To address this problem, a novel adaptive fractional order gradient descent (AFOGD) method and a novel adaptive fractional order accelerated gradient descent (AFOAGD) method are proposed in this paper. Inspired by the quadratic constraints and Lyapunov stability analysis from robust control theory, we establish a linear matrix inequality to analyse the convergence of our proposed algorithms. We prove that the proposed algorithms can achieve R-linear convergence when the objective function is $\textbf{L-}$smooth and $\textbf{m-}$strongly-convex. Several numerical simulations are demonstrated to verify the effectiveness and superiority of our proposed algorithms.


Algorithmic Trading Models - Machine Learning

#artificialintelligence

In the fifth article of this series, we will continue to summarise a collection of commonly used technical analysis trading models that will steadily increase in mathematical and computational complexity. Typically, these models are likely to be most effective around fluctuating or periodic instruments, such as forex pairs or commodities, which is what I have backtested them on. The aim behind each of these models is that they should be objective and systematic i.e. we should be able to translate them into a trading bot that will check some conditions at the start of each time period and make a decision if a buy or sell order should be posted or whether an already open trade should be closed. Please note that not all of these trading models are successful. In fact, a large number of them were unsuccessful.


Early Stopping Explained!

#artificialintelligence

Early stopping is one of the effective and simplest regularization techniques used in training neural networks. Usually, during training, the training loss will decrease gradually, and if everything goes well on the validation side, validation loss will decrease too. When the validation loss hits the local minimum point, it will start to increase again. Which is a signal of overfitting. How can we stop the training just right before the validation loss rise again? Or before the validation accuracy starts decreasing?


A Gentle Introduction to Particle Swarm Optimization

#artificialintelligence

Particle swarm optimization (PSO) is one of the bio-inspired algorithms and it is a simple one to search for an optimal solution in the solution space. It is different from other optimization algorithms in such a way that only the objective function is needed and it is not dependent on the gradient or any differential form of the objective. It also has very few hyperparameters. In this tutorial, you will learn the rationale of PSO and its algorithm with an example. Particle Swarm Optimization was proposed by Kennedy and Eberhart in 1995.


Gradient Descent

#artificialintelligence

Understanding the concept of the gradient is useful for understanding the logic of the gradient descent algorithm. Let's take a look at the explanation of the concept of stationary point in Wikipedia. As it can be understood from here, the gradient descent algorithm takes the points in the cost function and continues with the aim of reducing the derivative (slope) of these points in each iteration. The reason for this is to find the value whose slope is zero, in other words, the minimum point. When the coordinate values of this point are substituted in the hypothesis function, the function we obtain becomes the hypothesis function of the model with the least error we can create.